Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

نویسندگان

  • Sharath Adavanne
  • Giambattista Parascandolo
  • Pasi Pertilä
  • Toni Heittola
  • Tuomas Virtanen
چکیده

In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task. Real life sound recordings typically have many overlapping sound events, making it hard to recognize with just mono channel audio. Human listeners have been successfully recognizing the mixture of overlapping sound events using pitch cues and exploiting the stereo (multichannel) audio signal available at their ears to spatially localize these events. Traditionally SED systems have only been using mono channel audio, motivated by the human listener we propose to extend them to use multichannel audio. The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database [1]. The usage of spatial and harmonic features are shown to improve the performance of SED.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

In this paper, we propose a stacked convolutional and recurrent neural network (CRNN) with a 3D convolutional neural network (CNN) in the first layer for the multichannel sound event detection (SED) task. The 3D CNN enables the network to simultaneously learn the interand intra-channel features from the input multichannel audio. In order to evaluate the proposed method, multichannel audio datas...

متن کامل

Spatial audio and sensory evaluation techniques – context, history and aims

Spatial sound reproduction gives rise to new challenges for those trying to evaluate sensory features contributing to perceived quality. Recent technical developments have enabled the delivery of sophisticated multichannel audio signals to consumers, over links that range very widely in quality, requiring decisions to be made about the tradeoffs between different aspects of audio quality. Spati...

متن کامل

Comparison of Quality Degradation Effects Caused by Limitation of Bandwidth and by Down-mix Algorithms in Consumer Multichannel Audio Delivery Systems

The comparative effect on audio quality of controlled multichannel audio bandwidth limitation and selected downmix algorithms was examined. The investigation was focused on the standard 5.1 multichannel audio set-up (Rec. ITU-R BS.775-1) and was limited to the optimum listening position. The obtained results indicate that in case of multichannel audio systems spatial quality is less important t...

متن کامل

Parametric Coding of Stereo Audio Based on Principal Component Analysis

Low bit rate parametric coding of multichannel audio is mainly based on Binaural Cue Coding (BCC). Another multichannel audio processing method called upmix can also be used to deliver multichannel audio, typically 5.1 signals, at low data rates. More precisely, we focus on existing upmix method based on Principal Component Analysis (PCA). This PCA-based upmix method aims at blindly create a re...

متن کامل

Psychoacoustic-based quantisation of spatial audio cues

The derivation of spatial cues representing source localisation information is a typical component of multichannel spatial audio coders. Efficient compression of spatial cues based on psychoacoustic localisation features is investigated. Results show that the proposed quantisation approach for spatial cue compression achieves bit-rates of less than 6 kbit/s while preserving critical source loca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1706.02293  شماره 

صفحات  -

تاریخ انتشار 2016